Recitation Review of Parameter Estimation 10 - 701 / 15 - 781 Machine Learning , Fall 2009 Shay

نویسنده

  • Shay Cohen
چکیده

This is a set of notes, summarizing what we talked about in the second recitation. They are not meant to be rigorous, and are written informally, following what was discussed in class. The Likelihood Function Consider the probability distribution for the following unfair coin-toss random variable: P (X = 0) = 1/3 P (X = 1) = 2/3 This is an example of a probability distribution, and we are used to think about it as a function of possible outcomes of the coin, or other measurement that we have in the realworld. But what if we ask a different question: instead of asking “what is the probability of getting a tail or head,” we ask: “if we don’t know what is the bias of the coin, can we get a probability distribution which will give us some clue about its bias from what we observe?” What we do, more formally, is instead of having a single probability distribution, we define a family of distributions where the above P in Eq.1 is just an instance of. So, one natural family for unbiased coin tosses we could define is: {Pθ | θ ∈ [0, 1]} where Pθ(X = 0) = θ and Pθ(X = 1) = 1− θ. Now, our job is to assume we observe a few coin tosses and decide what is the best Pθ(·) from the above family. Notice, that with “parameteric methods”, we will always be seeking for Pθ(·) where θ ∈ Θ, and seeking that Pθ(·) is equivalent to choosing a θ. There are other families of models which don’t “parameterize” our distributions so easily, we will discuss some of these “non-parametric” ideas later in class. So, keep in mind our goal. It is to find some θ which conforms to what we observe. In order to do that, we change our view of P , and instead of looking at it as a function of X, we look at it as a function of θ, where X is fixed to our observations. More formally, what we do is to define the “likelihood function:” L(θ;x1, ..., xn) = n ∏

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recitation Review of Probability 10 - 701 / 15 - 781 Machine Learning , Fall 2009

This is a set of notes and exercises, summarizing and illustrating concepts that we discussed during the first recitation.

متن کامل

10 th Recitation Review of Structure Learning for Bayesian Networks and k - Means and a little bit about EM 10 - 701 / 15 - 781 Machine Learning , Fall 2009

This is a set of notes, summarizing what we talked about in the 10th recitation. They are not meant to be rigorous, and are written informally, following what was discussed in class. Structure Learning of Bayesian Networks Up until now we focused on learning how to estimate the conditional probabilities tables (CPTs) of Bayesian networks, given that the structure is fixed. Sometimes the structu...

متن کامل

Machine learning algorithms in air quality modeling

Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affec...

متن کامل

Spectral learning of latent-variable PCFGs: algorithms and sample complexity

We introduce a spectral learning algorithm for latent-variable PCFGs (Matsuzaki et al., 2005; Petrov et al., 2006). Under a separability (singular value) condition, we prove that the method provides statistically consistent parameter estimates. Our result rests on three theorems: the first gives a tensor form of the inside-outside algorithm for PCFGs; the second shows that the required tensors ...

متن کامل

A Coactive Learning View of Online Structured Prediction in Statistical Machine Translation

We present a theoretical analysis of online parameter tuning in statistical machine translation (SMT) from a coactive learning view. This perspective allows us to give regret and generalization bounds for latent perceptron algorithms that are common in SMT, but fall outside of the standard convex optimization scenario. Coactive learning also introduces the concept of weak feedback, which we app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009